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Abstract: Given an IID sample from a positive distribution, we provide a 
method for constructing rigorous finite sample lower confidence bounds for 
the expectation of the distribution. The method is based on constructing 
rigorous confidence regions for the cdf of the distribution. We provide some 
analysis of the asymptotical behavior of the rigorous LCBs. We apply the 
method to obtain an LCB for a particular, controversial, empirical data set, 
where the validity of standard methods has been called into question. 
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1. Introduction 

In general, given an IID sample from an unknown distribution, no rigorous 
confidence bounds can be provided for the expectation of the distribution. The 
possibility of the existence of a very low probability tail with some extreme 
value which has a large impact on the expectation can never be ruled out, or 
even made improbable by any finite sample. For example, with a sample of 
size n, the existence of an atom with probability and an arbitrary value, 
with magnitude large enough so as to impact the expected value greatly, is 
not unlikely. Indeed, it may be that an expectation does not even exist, or has 
infinite magnitude. 

The normal practice for estimating expectations is to ignore the possibility 
of the existence of low probability, extreme value tails, apply estimators with 
known asymptotical properties, and, in effect, assume that those properties are 
valid for the given sample size. Alternatively, rigorous confidence bounds, such 
as the Chebychev or Chernoff bounds, can be derived when certain moments are 
assumed to be bounded or when the distribution itself is assumed to lie within 
a bounded interval. 

Here, we use a weaker assumption, namely, that the distribution is over pos- 
itive numbers only, but aim at deriving only lower confidence bounds rather 
than a confidence interval. The same argument as above shows that without 
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additional assumptions, no upper confidence bounds for the expectation can be 
established. 

In addition to providing a guaranteed finite sample confidence level for any 
underlying distribution over the positive reals, the confidence bounds proposed 
have the pleasing property that they are monotonic in the order statistics of the 
sample. This eliminates the paradoxical phenomenon, which can occur with a 
normal-theory LCB, in which a positive outlier in the sample lowers the nominal 
LCB for the expectation. 

2. Setup and theorem 

Let X, Xi , . . . , Xn be IID variables from an unknown distribution over the real 
numbers, with P(X > 0) = 1. Let . . . , X(„) be the order statistics of 

Xi, . . . ,Xn- We wish to derive lower confidence bounds for the expectation of 
X: B ^ B{Xi, Xn) such that P{B > EX) < a. We rely on the following 
theorem. The theorem establishes a LCB for EX as a consequence of simulta- 
neous UCBs for the cdf of X at the sample points. 

Theorem 1. Let U ~ {U(i), ■ ■ ■ , ^6 vector of the order statistics of n 

independent samples from a \J [0,1] distribution. For any vector u — (ui, ... ,Un) € 
[0, 1]", define 

Pu = P(J^(i) < "1, ■ • ■ , Ui^n) < Un)- 

Define 

n 

Bu = (1 - life) - (1) 

k=l 
n 

= ^{uk+i-Uk)X(^k), (2) 

k=l 

where Un+i — 1 and X(q) — 0. Then i?„ is a level-pu LCB for the expectation 
ofX. 

Proof: The second equality follows from rearranging the terms of the sum. 
We prove the first equality: Let F~ (x) be the left-continuous cdf of the random 
variable X, i.e., 

F-{x) = P{X < x). 
Then for any monotonically non-decreasing vector x = {xi, . . . , Xn) & M"*"", 

n 

^ (1 - F-{xk)) (xfe - Xfe._i) < EX, 

k=l 

where xq = 0. 

The random variables F~{Xi), . . . , F~{Xn) are independent, identically dis- 
tributed variables such that P{F~{Xi) < x) > x. Therefore, there exists a set 
of IID U[0, 1] random variables Ui, . . . , Un, such that Ui > F{Xi), i = 1, . . . ,n. 
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It follows that for any vector u — {ui, . . . , w„) S [0, 1]" the probability of the 
event 

P(F(X(i)) <ui,...,f^(X(„)) <M„) 
is at least pu- On this event, 

n 

Bu < ^ (1 - F-{X^,))) - < EX. □ 

fe=l 



3. Tunable families of bound parameter vectors 

Theorem 1 implies that each vector u G [0,1]" defines a level-p„ LCB, 
for EX. Different choices of the parameter vector u result in different LCBs. 
It is convenient to construct families of parameter vectors in such a way that 
from each family a vector can be chosen to match a desired level of confidence: 
Let A be a closed subset of M. Define a tunable family of parameter vectors, 
U — {u^ S [0,1]" : A G A}, to be a set of vectors which is parameterized 
continuously by A and increasing monotonically in each coordinate to 1. That 
is, for alH = 1, . . . , n, the following hold: 

• If Ai < Aa then u^^ < . 

• If limfc Afc = A then lim^ w^'' — . 

Then for each a, < a < 1, there exists a unique A(a) such that 

A(a) = min { A e A : > 1 — a} . 

Given a desired confidence level 1 — a, or a set of confidence levels 1 — ai, 1 — 
a2, . . . , 1 — a/j, the corresponding A(a), or A(q;i), \{a2), ■ • ■ , ^(pik), can be de- 
termined numerically, to any desired precision, using simulation. 

Examples An infinite variety of tunable parameter vector families exist. A 
very simple family is defined by adding an offset to a the vector [-^;^, ■ ■ ■ , ;;^): 

Z^OFF = j ( min(l, + A), . . . ,min(l, + A) ) : A £ [0, 1] 

n+1 n+1 / 

When using this family, the LCB assumes the form 

LCBoFF = — ^ > ■ Xo,^ + ( 1 - A - 1 X, 



OFF - ^ E ^(0 -ry^-A- , ^(„,) 

where nx = \in + 1)(1 - A)] . 

Another family results from using confidence bounds for the beta distribution. 
Let B[a,b,p) be a level-p UCB for the beta distribution with parameters a 
and h. The construction of this family is intuitively motivated by the fact that 
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for continuous IID random variables Xi , . . . , Xn the marginal distribution of 
-^(^(j-i) is a beta distribution with parameters i and n — i + 1 for all z = 1, . . . ,n. 
Define: 

Z^BETA = { n, A), B(2, n - 1, A), ... , 1, A)) : A S [0, 1]} . 



4. Some asymptotical analysis 

Analysis of the asymptotical behavior of the rigorous LCB is facilitated by the 
Donsker property of the empirical process (see, for example, [1], chapter 2.1). 
The Donsker property implies that for continuous distributions the centered 
and scaled empirical process converges in distribution to the standard Brownian 
bridge. The centered and scaled empirical process, (i?„(t),0 <t< 1), is defined 
as: 



1 / " 



This property can be used directly to calculate the asymptotical behavior of the 
rigorous bound obtained when using the offset family, LCBqff , as we do below. 
Asymptotical analysis of the rigorous LCB for other families such as LCBbeta 
would be more complex. 

The distribution of M, the supremum of the Brownian bridge, is 



P(M > x) = e 



putting the 1 — a quantilc of the distribution, (/q at y i log ^. When using the 
offset family with a sample of size n, the member selected for a 1 — a LCB will 
be approximately (;;^ + . . . , ^" distribution of X has a finite 

mean LCBqff is equal to 



its expected value is 



Tl ^ + 



ELCBqff = EXl<i X<F-^[l- 1+0 



and so. 



EX - ELCBqff = EXljx > F^^ ( 1 



1 



The first term on the right is the integral of the tail of the distribution. If 
EX is finite, this term approaches zero as n increases, guaranteeing that the 
LCB is consistent. However, the convergence may be arbitrarily slow unless 
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sonic additional assumptions regarding the distribution of X are made. The 
convergence is 0{n^^) if and only if X is bounded almost surely. Using the 
Holder inequality it can be shown that if EX''+'^ < oo for some positive r and 
e then the convergence is o(n~2 + 27). 

Of course, for the normal theory LCB, LCBNormai, 

V^(EX - E LCBNormai) = $(1 - a/2)VV^, 

guaranteeing 0{n^^) if VarX is finite, but no convergence if VarX is infinite. 

Thus, asymptotically, the normal theory LCB converges faster than the rig- 
orous LCB whenever VarX exists (unless X is bounded, in which case both 
LCBs converge as 0(n~2))^ but the rigorous LCB guarantees convergence to 
EX when VarX is infinite, i.e., in situations in which the normal-theory LCB 
diverges in expectation. 

5. Application to the Lancet study of mortality in Iraq 

The results above provide a method for generating theoretical rigorous lower 
confidence bounds for the expectation of a positive random variable. These 
bounds can be applied in situations where conventional methods for producing 
confidence bounds are challenged based on the fact that the validity of those 
methods relies on asymptotical analysis which may not hold for samples of a 
given size from particular a distribution. 

One such case is the politically sensitive estimate of mortality in Iraq following 
the U.S. led invasion. In 2006 a group of researchers from the Johns Hopkins 
Bloomberg School of Public Health carried out a survey among households in 
Iraq aimed at estimating mortality [2] . They provided a point estimate of about 
601,000 violent deaths in Iraq for the period March 2003 to July 2006 and 
a 95% confidence interval of 426,000-794,000. Due to the potential political 
implication of the findings, the study received intense scrutiny. Most of the 
attention was focused at the various potential biases introduced into the data 
collection process by a methodology constrained by the conditions in Iraq (see a 
summary of such points of criticism in [■)]). In addition, however, and despite the 
fact that the estimation procedure used was apparently identical to that used 
in similar studies, some criticism was made of the estimation procedure itself. 
Doubts were voiced as to whether the normal theory 95% confidence interval 
did indeed have its nominal probability of coverage and it was suggested that 
the a true 97.5% LCB would be drastically lower than the left point of the 
interval [4]. 

We follow here the treatment of Mark van der Lann [4]. He uses a some- 
what stylized setup in which the death counts in the 49 clusters in the sample 
collected by Burnham et al. are assumed to be IID samples from an unknown 
distribution of violent deaths in household clusters in Iraq^. Each cluster con- 
tains 40 households, so under van der Laan's setup the unknown mean of the 



^In reality, the sample was stratified geographically by governorates. 
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distribution is 40 times the mean number of violent deaths per household in 
Iraq. 

Van der Lann provides the death counts in the 47 clusters as follows: 
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The sample mean is 6.4 and the sample standard deviation is 8.3, giving a 
classical normal-theory 97.5% LCB for the expectation of the distribution of 
4.0, i.e., 63.0% of the sample mean. Employing the method above, we obtain a 
rigorous LCB for the expectation of 2.3 (36.5% of the sample mean) using the 
offset family and 2.8 (43.8% of the sample mean) when using the beta family.-^ 
We therefore note that while the rigorous LCBs constructed here are signifi- 
cantly lower than the nominal normal-theory LCB, they are not dramatically 
different (reducing the bound by about one third). This suggests that using 
such a technique can be useful when dealing with certain situations in which 
the validity of traditional methods may be called into doubt. 

Figure 1 demonstrates the construction of the LCBs graphically. It shows 
the empirical cdf together with lines signifying the boundaries of the confidence 
regions established for the cdf using the offset family (dotted line) and the beta 
family (dashed line). The LCBs for the expectation are the areas to the left of 
and above those two curves. 

6. Further research 

One point associated with the method presented that may merit further research 
regards the choice of tunable family. Are some families better - i.e., yield tighter 
bounds - than others, across all possible distributions? Can families be chosen 
so as to match various properties of the distribution? 

Another avenue of research would be to produce extensions of the method 
in order to make it applicable to a wider variety of situations. One desirable 
extension would be to cover cases where the sample is stratified, while another 
would be to cases where random censoring occurs. 



^Employing a simple linear relationship between the expectation of deaths in a cluster and 
the total number of deaths in the population, these bounds would correspond to LCBs for the 
total number of violent Iraqi deaths of 219,000 (offset family) and 263,000 (beta family) in 
the period covered by the survey. 
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